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Abstract — We measure the influence of individual observations 
on the sequence of the hidden states of the Hidden Markov 
Model (HMM) by means of the KuUback-Leibler distance (KLD). 
Namely, we consider the KLD between the conditional distribu- 
tion of the hidden states' chain given the complete sequence 
of observations and the conditional distribution of the hidden 
chain given all the observations but the one under consideration. 
We introduce a linear complexity algorithm for computing the 
influence of all the observations. As an illustration, we investigate 
the application of our algorithm to the problem of detecting 
meaningful observations in HMM data series. 

Index Terms — Hidden Markov models, relative entropy, 
forward-backward algorithm, outlier detection, local outlier fac- 
tor 

I. Introduction 

THE Hidden Markov Model (HMM) is a standard tool 
in many applications, including signal processing and 
speech recognition H], IJ], fJI and computational biology 
Igl. In a typical HMM, let Si,n ^ {Si,...,Sn) be the 
Markov sequence of hidden variables (or states) and Xi-^ — 
{Xi, . . . ,Xn) the sequence of observation variable^ In this 
letter we address the problem of measuring the influence of 
an observation Xj = xj on the distribution of the hidden 
sequence Si-n- 

We start by fixing notation. For simplicity's sake, we 
consider homogeneous HMMs and denote the parameters of 
the model with ¥{Xi — x\Si = s) — /3{s,x) (emissions), 
f{Si = sl^,;-! = r) — a{r,s) (transitions) and P(5i = 
s) = 7(5). The model is fully specified by the conditional 
dependencies among the variables depicted in Fig. [T| which 
determine the following factorization of the joint probability 
distribution 

¥{Xi;n = Xi;n,Si;n = Si;n) — 

n n 

7(s) Y[ a{s,-i,s,) Y[ /3(s», x^), 

where Si , Xi are taken in the sets of all possible outcomes of Si 
and Xi (for continuous variables simply replace probabilities 
with densities and sums with integrals). For simplicity of 
notation, in most equations we omit to write explicitly the 
outcomes of the variables. 

An important inference problem in HMMs is computing 
the conditional (posterior) distribution of the hidden sequence 
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given an evidence. In standard applications, the evidence is 
a complete instantiation of the observable sequence, 8 — 
{Xi-n = xi-n} for some xi-^. For a fixed j e {!,..., n], we 
denote the evidence {X^j — a;-j}, where X_j denotes 
the sequence of all the observation variables except Xj. 

Our suggestion for measuring the influence of an obser- 
vation Xj = Xj is based on the following question: what 
is the contribution of Xj — Xj to the posterior distribution 
Si:n\{Xi;n = a^i-n} of the hidden sequence given the com- 
plete sequence of observations? That is, how dissimilar are the 
posterior distributions P(S'i:„|f ) and P(5i:„|f The more 
these two posterior distributions are distant, the more Xj — Xj 
must be influential. 

The KuUback-Leibler distance (KLD) (or relative entropy) 
arises in many applications as an appropriate measurement 
of the distance between two probability distributions ||5l, IS). 
Following |7| and |8| (in the context of linear regression), we 
suggest to measure the influence of Xj = Xj through the KLD 



nSl:n\S-,) 



By definition, Kj measures the influence of observation Xj on 
the posterior distribution of the hidden states rather than on the 
parameter estimate. Kj is therefore an appropriate influence 
measure when the quantity of interest is the posterior distri- 
bution as it is often the case in practical applications such as 
speech recognition |3j|, data segmentation |9|, bioinformatics 
ITOl . genetics ifTTl . 

In this letter we address the problem of computing effi- 
ciently the vector (A'j)j^i „ of all the KL distances, one 
for each observation. To the best of our knowledge, the 
computation of the KLD between the posterior distributions 
of the hidden sequence of an HMM conditioned on two 
distinct evidences was not studied before, the main efforts 
being rather aimed at computing efficiently the KLD between 
the distributions of the observation sequence of an HMM with 
respect to two distinct sets of parameters H, lfT2ll . ifTSll . 

A straightforward computation of (i^j)j=i....,ri based on 
the standard forward-backward algorithm for HMMs leads to 
a quadratic complexity in the number of observations; our 
main contribution is a linear time algorithm based on simple 
recursive formulae. 

As an illustration, we apply our algorithm to a time series 
of temperature changes and discuss the practical interest of 
the suggested influence measure. 
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Fig. 1: HMM topology. Sj: hidden variable, Xj: observed 
variable. 



As a consequence of this lemma, the key for computing ef- 
ficiently (-ftrj )j=i,....ri is an efficient computation of the factors 
V{Sj\£) and P(4|£-j) for all j = 1, . . . , n. For a given j, 
V{Sj\£) can be computed in 0{nm?) steps using the standard 
forward-backward algorithm: P{Sj = s,£) — Fj{s)Bj{s) and 
hence P{Sj ~ s\£) cx Fj{s)Bj{s), where the standard forward 
and backward quantities are computed recursively with 



(1) 



II. Computation of the Influence Measure 

We start by recalling that the posterior distribution 
5i:„|£) of the hidden sequence given the standard evidence 
is an heterogeneous Markov sequence whose transition prob- 
abilities are computed in 0{nm^) steps with the standard 
forward-backward algorithm, where m is the is the number of 
possible outcomes of each hidden variable, [2J. The forward 
and backward quantities are defined as Fi{s) :— P(Xi.i, Si ~ 
s), Bi{s) = ¥{Xi^i.n\Si — s) and are computed recursively 
with Eqs. ([T]) and (|2]l. 

Similarly to P(S'i:„|£:), the computation of ¥{Si.,n\£-j), 
for a fixed j, requires 0{nrn?) steps: it is easy to adapt the 
forward-backward algorithm to by simply marginalizing 
out the variable Xj in all the formulae and propagating the 
new forward and backward quantities thus obtained. 

It is straightforward to compute recursively the KLD be- 
tween two heterogeneous Markov Chain; as a consequence, 
a direct approach based on standard recursions leads to a 
0{nm^) time complexity for computing Kj for a fixed j. 
However, the required marginalization of the forward and 
backward quantities depends on the fixed j and therefore it is 
necessary to compute a distinct set of forward and backward 
quantities for each j. As a consequence, the resulting com- 
plexity for computing the vector (i<'j)j=i,...,ri is 0{n^nn?). 

Our principal contribution are new recursive formulae that 
reduce this complexity to 0{nm?). We start with two technical 
lemmas that lead to our original algorithm. 

Lemma 1: For an arbitrary fixed j £ {1, . . . , n} : 

J2PiS,\£^,)lo/^^^^^-^^ 

s, 



and 



ns,\£) 



Proof: We start by observing that the following factor- 
izations hold: 



nSl:n 

and 



\£^j)P{Si:j^l\Sj,£^j)P{Sj^l.n\Sj,£^j 



P{Si..„\£) = P{S,\£)PiSi..j-i\Sj,£)PiS,+i..r.\Sj,£). 
The key point is that in the last equation we have 

P{Sj + l:n\Sj ,£) = P{Sj + i:n\Sj,£^j) 

^^..,^^\S„£)^PiS^..,^^\S,,£^,) 



because S'j+i:„ and Xj are conditionally independent given 
Sj, and '5i:i-i and Xj are conditionally independent given 
5„ see Fig.ll] Then = Zs,nS,\£-,)log^^x 

^i^J-ll^J^^-j) Y.S, + i,„ ^iS3 + l-n\Sj,£-j)- ■ 



B,^i{r)^J2a{r, s)(3{s,Xi)B,{s). 



(2) 



We show a similar result for P{Sj\£-j): 

Lemma 2: For an arbitrary fixed j e {1, . . . , n} 



',^s,£^j) = F*{s)B,{s), 



where Bj is the standard backward quantity and F* is com- 
puted recursively from the standard forward quantities with 

F* (s) = F^-lir)a{r, s) for z = 2, . . . , n, (3) 



with Fi{s) = 7(s). Moreover the time complexity for com- 
puting P{Sj = s\£^j) cx F*{s)Bj{s) for all j = 1, . . . ,n is 
O(nTO^). 

Proof: For a given j, we have P{Sj = s,£_j) = 



Y,ns,^s,£_,,x, = y)^Y. 



S , £y ) , 



where £y is the standard evidence {X^j — X-j,Xj = y}. 
For each y there is a distinct set of standard forward and 
backward quantities Ff,Bf; however it is easy to see that 
Fl' = F^fori<j-l and Bf = B, for i > j. It follows that 

EynS,^s,£y)^ 

^F/(.)i?]'(.) = i3,(.)5]^/(.) = 

y y 

y r 

Bjis)Y,Fj-i{r)a{r,s). 



Our main result is a straightforward consequence of the two 
lemmas above: 

Theorem 3: For an arbitrary fixed j E {1, . . . ,n} : 

F*is)B,{s) (f;{s) j:rF,{r)B,ir) \ 

' ^j:rF*{r)B,{r) ^ \F,is) Er F*ir)B,{r) ) ' 

where the quantities (i^j)j=i,. -.ni {Bi)i=i....,n and 
{F*)i=i^,,,,n are computed once and for all independently 
of j using the recursions ([T]i, (j2]), The complexity of 
computing (ifj)j=i,....n is 0{jnriF). 
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Fig. 2: KLD function (i^j)j=i....,n for the temperature change 
time series. The five highest Kj are: Kigu = 2.96, ^1915 = 
2.30, 7^1900 = 1.82, Kis98 = 1.47, A'1914 = 1.46; the 
corresponding datapoints are depicted as empty dots. 



III. Application to Time Series Segmentation 

We illustrate the practical interest of our influence measure 
on a real dataset, namely a time series consisting of 106 annual 
changes in global temperature between 1880 and 1985 |,14J . 
Following the approach suggested by [91, the dataset can be 
modeled with an homoscedastic HMM in which each obser- 
vation follows a Gaussian distribution whose mean depends 
on the corresponding hidden state; we assume that there are 
three hidden states. We estimated the parameters of the HMM 
with the EM algorithm and obtained for the three hidden 
Gaussian distributions the means /ii ~ —0.372, 112 — 0.069, 
fi3 = —0.068, and standard deviation a = 0.114; moreover the 
transition matrix is 7r(i, j) = i]/2 if i 7^ j and Tr{i, z) = 1 — 77, 
where the estimated transition rate is ?; = 0.085. 

Fig. |2] shows the temperature time series together with the 
KLD Kj for each j. Five years clearly appear to have a 
greater influence on the posterior distribution of the hidden 
states: 1917, 1915, 1900, 1898, 1914. It might be interesting to 
investigate the reasons why these five years are so influential, 
looking for either specific climatic events or possible changes 
in the data collection protocol. 

In order to validate the findings in Fig. |2] we further 
investigated the effect of the five most influential observations 
on the posterior distribution of the segmentation by comparing 
the marginal posterior distributions obtained with all the obser- 
vations and after removing the five most influential ones, see 
Fig.|3] Unsurprisingly, the most dramatic changes occur in the 
neighborhood of the removed data. When all the observations 
are taken into account, the period 1880-1920 is characterized 
by a long segment of negative annual temperature change 
interrupted by two short periods of slightly positive annual 
change around years 1900 and 1914 (Fig. [3] top). When the 
most influential observations are not considered, these two 
interruptions basically disappear (Fig. [3j bottom). 

The KLD measure of influence is hence clearly effective in 
pointing out observations that have a dramatic effect on the 
posterior segmentation. These observations can be interpreted 
either as critical and particularly meaningful data or as outliers 
(i.e. observations that are not generated by the underlying 
statistical model). 
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Fig. 3: Marginal posterior distributions P(S'j|obs) of the three- 
level segmentation considering all the observations (top) and 
after removing the five most influential ones (bottom). Solid 
black line: P{Sj — l|obs), i.e. xj is Gaussian with mean /ii = 
-0.372; dashed red: ¥{Sj = 2|obs) with ^2 = 0.069; dotted 
blue: ¥{Sj = 3|obs) with = -0.068. 



Application to Outlier Detection 

Following ifTSl (in the context of linear regression), we ar- 
gue that the KLD-based measure of influence of an observation 
can be also used for effective outlier detection in data modeled 
with the HMM. Indeed, if Xj = x.j is an outlier, then it must 
have a strong influence on the posterior distribution of the 
hidden variables, which in turn, must differ significantly from 
the posterior distribution of the hidden variables conditioned 
on all the observations but Xj. In other words, we expect the 
KLD distance Kj to be significantly larger when Xj = Xj is 
an outlier (an illustration supporting this assumption can be 
found in the Supplementary Material). 

In order to explore whether the KLD is an appropriate 
measure for outlier detection, we considered semi-parametric 
simulations based on the time series of changes in global 
temperature described above. The original data is assumed 
to be free of outliers. 1000 simulations under the null hy- 
pothesis HO (no outliers) were obtained by random sampling 
n = 106/2 — 53 data points in the original time series. 1000 
simulations under the alternative hypothesis HI (presence of 
outliers) were obtained by sampling n = 53 data points 
from the original time series and adding a Gaussian noise 
Af{0, S'^) to each of them with probability 0.05. Hence, the 
resulting average number of outliers in each HI simulation is 
0.05 • n = 2.65. 

For each simulation, we computed the following global 
statistics for outlier detection: the maximum Kj, the maximum 
absolute normalized z-score (using a three component mixture 
model) and the maximum Local Outlier Factor (LOF) score 
computed with the R package Rlof lfT6l after rescaling both 
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TABLE I: Performance of methods for outlier detection: 
empirical AUC with 95% confidence intervals, sample size 
= 400. S is the standard deviation of the Gaussian noise 
characterizing outliers. 



Method 


5 = 0.5 


S = 2.0 


<5 = 3.0 


KLD 


0.62 [0.57,0.68] 


0.79 [0.74,0.84] 


0.86 [0.82,0.90] 


Z-value 


0.58 [0.52,0.64] 


0.61 [0.55,0.66] 


0.59 [0.53,0.65] 


LOF 


0.73 [0.68,0.78] 


0.93 [0.90,0.96] 


0.94 [0.91,0.96] 



year and temperature axes. Details on the statistics can be 
found in the Supplementary Material. 

The performances of the three global statistics for three 
different values of S were assessed with the empirical AUC 
(computed with Iil7il ): the results are depicted in Table |l] The 
statistics based on the Z-value have very poor performance, 
whereas the KLD-based statistics has a good discriminating 
power for 5 > 2.0. However, the method consisting in 
computing the LOF scores after normalizing both axes prove 
to be very performant for each value of 6. All three methods 
are very fast: it takes less than 0.5 seconds for generating a 
simulation and computing all three statistics. 

IV. Conclusions 

An interesting question in Hidden Markov Models is assess- 
ing the relative importance of each observation with respect to 
the sequence of hidden states. In order to measure how influ- 
ential is the j-th observation, we suggest to use the Kullback- 
Leibler distance Kj between the conditional distribution of 
the hidden sequence given the whole observation sequence 
and the conditional distribution of the hidden sequence given 
all the observations but the j-th one. The suggested measure 
of influence focuses on the posterior distribution of the hidden 
sequence rather than on the parameter estimate (like in sensi- 
tivity analysis) and it is therefore suitable for problems where 
the information of interest is the hidden sequence (speech 
recognition f3|, genetics fTTl, bioinformatics fTOl) 

The most important contribution of this letter is a novel 
linear complexity algorithm for computing the measures of 
influence of all the observations. Our algorithm is based on 
simple recursions derived from the forward-backward algo- 
rithm for HMMs and can be easily extended in order to take 
into account pairs, triplets or h consecutive observations. In 
this case the complexity is 0{nhm?). The algorithm can be 
also extended to more complex configurations of observations, 
the resulting complexity depends on the combinatorics of the 
configuration. 

We showed that the KLD influence measure can help to 
detect outliers in time series modeled by HMMs, the intuition 
being that anomalies must be more influential than other 
observations. In this context, the KLD-based method proves 
to be efficient for global detection, even though it is less 
performant than specific methods such as the LOF algorithm 
(after appropriate rescaling). 

However, the main interest of the KLD measure of influence 
is the detection of individual observations which, rather than 
being outliers, are meaningful values playing a critical role 



in the problem under consideration. New knowledge can be 
uncovered by investigating the most influential observations 
found with our influence measure. For example, in the context 
of protein structure analysis, structural alphabet are encoded 
through HMMs fTOl. Pointing out highly influential residuals 
in the encoding through the KLD measure might reveal 
interesting structural properties (e.g. alternative 3D-structures). 

References 

[1] Y. Ephraim and N. Merhav, "Hidden Markov processes," IEEE Trans- 
actions on Information Theory, vol 48, no. 6, pp. 1518-1569, 2002. 

[2] L. Rabiner, "A tutorial on hidden Markov models and selected applica- 
tions in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, 
pp. 257-286, 1989. 

[3] B. Gold, N. Morgan, and D. Ellis, Speech and audio signal processing. 
Wiley Online Library, 2011. 

[4] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Se- 
quence Analysis : Probabilistic Models of Proteins and Nucleic Acids. 
Cambridge University Press, Jul. 1999. 

[5] C. M. Bishop, Pattern Recognition and Machine Learning (Information 
Science and Statistics). Secaucus, NJ, USA: Springer- Verlag New York, 
Inc., 2006. 

[6] M. Do, "Fast approximation of Kullback-Leibler distance for depen- 
dence trees and hidden Markov models," IEEE Signal Processing 
Letters, vol. 10, no. 4, pp. 115-118, 2003. 
[7] D. Cook, "Detection of Influential Observations in Linear Inference," 
Journal of Statistical Planning and Inference, vol. 37, pp. 51-68, 1977. 
[8] W. Johnson, "Influence measures for logistic regression: Another point 

of view," Biometrika, vol. 72, no. 1, pp. 59-65, 1985. 
[9] J. Fridlyand, A. Snijders, D. Pinkel, D. Albertson, and A. Jain, "Hidden 
Markov models approach to the analysis of an'ay CGH data," Journal 
of Multivariate Analysis, vol. 90, no. 1, pp. 132-153, 2004. 

[10] A. Camproux, R. Gautier, P. Tuffery et al, "A hidden markov model 
derived structural alphabet for proteins," Journal of molecular biology, 
vol. 339, no. 3, pp. 591-606, 2004. 

[11] Y. Li, C. Wilier, J. Ding, R Scheet, and G. Abecasis, "MaCH: using 
sequence and genotype data to estimate haplotypes and unobserved 
genotypes," Genetic epidemiology, vol. 34, no. 8. pp. 816-834, 2010. 

[12] J. Silva and S. Narayanan, "Upper Bound Kullback-Leibler Divergence 
for Transient Hidden Markov Models," IEEE Transactions on Signal 
Processing, vol. 56. no. 9, pp. 4176^188, 2008. 

[13] S.M.E. Sahraeian and B.J. Yoon, "A novel low-complexity HMM 
similarity measure," IEEE Signal Processing Letters, vol. 18, no. 2, pp. 
87-90, 2011. 

[14] R. J. Hyndman, "Time Series Data Library," http://data.is/TSDLdemo| 

Accessed on September, 24 2012. 
[15] S. Chatterjee and A. Hadi, "Influential observations, high leverage 

points, and outliers in linear regression," Statistical Science, vol. 1, no. 3, 

pp. 379-393, 1986. 
[16] Y. Hu, W. MuiTay, and Y. Shan, Rlof: R parallel implementation of 

Local Outlier Factor (LOF), 2011, R package version 1.0.0. [Online]. 

Available: http://CRAN.R- project.org/package=Rlof 
[17] X. Robin, N. Turck, A. Hainard, N. Tiberti, F Lisacek, J.-C. Sanchez, 

and M. Miiller, "pROC: an open-source package for R and S-l- to analyze 

and compare ROC curves," BMC Bioinformatics, vol. 12, p. 77, 2011. 
[18] M. Markou and S. Singh, "Novelty detection: a review. Part 1: statistical 

approaches," Signal Processing, vol. 83, no. 12, pp. 2481-2497, 2003. 
[19] D. Yeung and Y. Ding, "Host-based intrusion detection using dynamic 

and static behavioral models," Pattern recognition, vol. 36, no. 1, pp. 

229-243, 2003. 

[20] D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, "Semi- 
supervised adapted HMMs for unusual event detection," in IEEE Com- 
puter Society Conference on Computer Vision and Pattern Recognition 
CVPR 2005, vol. 1. IEEE, 2005, pp. 611-618. 

[21] S. Shah, X. Xuan, R. DeLeeuw, M. Khojasteh, W. Lam, R. Ng, and 
K. Murphy, "Integrating copy number polymorphisms into array CGH 
analysis using a robust HMM," Bioinformatics, vol. 22, no. 14, pp. e431- 
e439, 2006. 

[22] M. Siu and A. Chan, "A robust Viterbi algorithm against impulsive noise 
with application to speech recognition," IEEE Transactions on Audio. 
Speech, and Language Processing, vol. 14, no. 6, pp. 2122-2133, 2006. 



5 



[23] S. Chatzis and T. Varvarigou, "A Robust to Outliers Hidden Markov 
Model with Application in Text-Dependent Speaker Identification," in 
IEEE International Conference on Signal Processing and Communica- 
tions ICSPC 2007. IEEE, 2007, pp. 804-807. 

[24] S. Chatzis, D. Kosmopoulos, and T. Varvarigou, "Robust sequential 
data modeling using an outlier tolerant hidden Markov model," IEEE 
Transactions on Pattern Analysis and Machine Intelligence, vol. 31, 
no. 9, pp. 1657-1669, 2009. 

[25] M. Breunig, H. Kriegel, R. Ng, and I. Sander, "LOF: identifying density- 
based local outliers," in ACM Sigmod Record, vol. 29, no. 2. ACM, 
2000, pp. 93-104. 



Original data 



Q 



1880 1900 



1920 1940 
Year 



— I 1 1 

1960 1980 



Original data with outliers 



Vittorio Perduca received the M.Sc. and Ph.D. in mathematics from the 
University of Turin (Italy) and the M.Sc. in computational biology from Paris 
Descartes University (France) in 2004, 2009 and 2011 respectively. 

Dr. Perduca's postdoctoral fellowship is supported by the Fondation 
Sciences Mathematiques de Paris; his research interests include belief prop- 
agation algorithms for Bayesian networks with applications in computational 




1880 1900 1920 1940 1960 1980 
Year 



Gregory Nuel received the Ph.D. in mathematics and the Habilitation 
d Diriger des Recherches in 2001 and 2007 respectively, both from the 
University of Evry (France). 

Dr Nuel currently works as a senior researcher for CNRS. His topics of 
interests include models with incomplete data, Bayesian networks, motifs in 
random sequences, and a wide range of biomedical applications. 



Appendix 

Supplementary Material with Technical Details 

Application to Outlier Detection 

In this section, we give more details on our application of 
the KLD measure of influence to the detection of outliers in 
HMMs. 

A few methods based on the HMM have been developed 
for outlier detection p8l|, fT9l, f20|. In the main paper we 
consider a related yet different problem, namely the detection 
of outliers in data modeled with the HMM, for instance time 
series. This problem is not new in the literature, for instance 
an ad hoc model for outliers in data modeled by HMMs was 
introduced in a Bayesian framework in ll2Tll . Other authors 
suggested to tackle the problem by means of a robust Viterbi 
algorithm performing a joint decoding and outlier detection 
during the Viterbi search |22|. Following ifTSl (in the context 
of linear regression) we suggest to detect outliers in HMMs by 
means of our KLD-based measure of observation influence. 

An outlier is an observation that is not generated by the 
underlying statistical model. Since HMMs are intrinsically 
heterogeneous, the detection of outliers in data modeled by 
HMMs is a challenging problem. For instance, change point- 
detection methods based on HMMs are known to be particu- 
larly sensitive to the presence of outliers in the sense that a 
single outlier can result in a segment consisting in just one 
point 123], 124|. 

As explained in the main paper, we expect the KLD distance 
Kj to be significantly larger when Xj = Xj is an outlier 



Fig. 4: KLD function (iirj)j=i,...,„ for the original time series 
(top) and for the times series with two outliers artificially 
added (triangles); a;i884 = 0.2, 2:1939 = —0.6. 

A. Illustration 

Consider the same time series consisting of 106 annual 
changes in global temperature between 1880 and 1985 as in 
the main paper The dataset can be modeled with an HMM 
in which each observation follows a Gaussian distribution 
whose mean depends on the corresponding hidden state. The 
upper plot in Fig. |4] shows the KLD function computed 
after estimating the parameters with the EM algorithm (same 
function as in the upper plot of Fig. 2 in the main text but on a 
different scale). We assume that the original data is outlier free 
(as we explain in the main text, the peaks in the figure can be 
interpreted as pointers to meaningful observations). However, 
if the dataset contains outliers, can we detect them with our 
measure of influence based on the KLD? In order to answer 
this question, we manually added two outliers and re-computed 
the KLD function, after re-estimating the parameters. The 
results are depicted in the lower plot of Fig. |4]and clearly show 
that the KLD function has two peaks in the two outliers. 

B. Comparison with other methods 

We give here the details of the empirical comparison study 
whose results are reported in the main paper 

Data. We considered semi-parametric simulations based on 
the time series of changes in global temperature described 
above. The original data is assumed to be free of outliers. 
1000 simulations under the null hypothesis HO (no outliers) 
were obtained by random sampling n = 106/2 ~ 53 data 
points in the original time series. 1000 simulations under the 
alternative hypothesis HI (presence of outliers) were obtained 
by sampling n = 53 data points from the original time 
series and adding a Gaussian noise A/'(0, (5^) to each of them 
with probability 0.05. Hence, the resulting average number of 
outliers in each HI simulation is 0.05 • n — 2.65. 
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We tested the global hypothesis HI that the data contain at 
least one outlier against the hypothesis HO that there are no 
outliers with the following alternative methods: 

KLD-based method. For each simulation q we estimated 
the parameters in the HMM modeling the dataset with the EM 
algorithm and then computed the global statistics 

Tq — max Kj. 

j = l,...,n 

Z-value. For each simulation q we clustered the data with 
the fc-means algorithm (fc = 3) and then computed the Z-value 
Zj = of each data point Xj with respect to the mean 

/i and standard deviation a of its cluster. We considered the 
global statistics 

5"^ = max \Zj\. 

j = l,...,n 

Local Outlier Factor (LOF). The LOF algorithm is a 
density based method 125). For each data point, the LOF 
score is calculated by comparing the local density of the 
point (defined as the inverse of the average distance from 
its r-nearest neighbors) to the average of the densities of its 
neighbors. The score is interpreted as a measure of whether 
the point is in a denser or sparser region of the dataset. A 
ranking of the points as outliers is obtained by sorting them 
according to their LOF scores. 

The LOF score depends on the choice of the distance 
parameter r; as suggested in f25l for each point we took the 
maximal LOF score on a range of integer values for r, namely 
r g {10, . . . , 20}. We considered the global statistics 

La = max max LOFrixj'tj), 

^ j = l,...,nr=10,...,20 

where Xj and ij are the standardized values of Xj and tj (i.e. 
we rescaled both axes before computing the LOF scores). The 
LOF scores were computed using the R package Rlof |16|. 

ROC analysis. We assessed the performance of each 
method by means of the empirical Area Under the Curve 
(AUC). The AUC measures the surface under the Receiver 
Operating Characteristic (ROC) curve and can be qualitatively 
interpreted as follows: AUC ^ 0.6 means "fail"; 0.6 < 
AUC < 0.70 means "poor"; 0.7 < AUC ^ 0.80 means "fair"; 
0.8 < AUC < 0.9 means "good"; 0.9 < AUC ^ 1.0 means 
"excellent". AUC computations were performed with the R 
package pROC |17| using the statistics computed for each 
method and simulation. 



